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We propose a method of analysis of dynamical networks based on a recent measure of Granger 
causality between time series, based on kernel methods. The generalization of kernel Granger 
causality to the multivariate case, here presented, shares the following features with the bivariate 
measures: (i) the nonlinearity of the regression model can be controlled by choosing the kernel 
function and (ii) the problem of false-causalities, arising as the complexity of the model increases, 
is addressed by a selection strategy of the eigenvectors of a reduced Gram matrix whose range 
represents the additional features due to the second time series. Moreover, there is no a priori 
assumption that the network must be a directed acyclic graph. We apply the proposed approach 
to a network of chaotic maps and to a simulated genetic regulatory network: it is shown that 
the underlying topology of the network can be reconstructed from time series of node's dynamics, 
provided that a sufficient number of samples is available. Considering a linear dynamical network, 
built by preferential attachment scheme, we show that for limited data use of bivariate Granger 
causality is a better choice w.r.t methods using LI minimization. Finally we consider real expression 
data from HeLa cells, 94 genes and 48 time points. The analysis of static correlations between genes 
reveals two modules corresponding to well known transcription factors; Granger analysis puts in 
evidence nineteen causal relationships, all involving genes related to tumor development. 

PACS numbers: 05.45.Xt , 87.18.Vf 



I. INTRODUCTION 

Dynamical networks [l[ model physical and biological behavior in many applications; examples range from networks 
of neurons Q, Josephson junctions arrays [|| to genetic networks Q, protein interaction nets H and metabolic 
networks [6j . Synchronization in dynamical networks is influenced by the topology of the network [7[ . A great need 
exists for the development of effective methods of inferring network structure from time series data; a method for 
detecting the topology of dynamical networks, based on chaotic synchronization, has been proposed in [1]; a recent 
approach deals with the case of low number of samples and proposed methods rooted on LI minimization [9]. 

Granger causality has become the method of choice to determine whether and how two time series exert causal 
influences on each other • This approach is based on prediction: if the prediction error of the first time series is 
reduced by including measurements from the second one in the linear regression model, then the second time series 
is said to have a causal influence on the first one. This frame has been used in many fields of science, including 
neural systems [ll| , reo-chaos fl2| and cardiovascular variability [l3| . The estimation of linear Granger causality from 
Fourier and wavelet transforms of time series data has been recently addressed [lij • 

Kernel algorithms work by embedding data into a Hilbert space, and searching for linear relations in that space [ll| . 
The embedding is performed implicitly, by specifying the inner product between pairs of points [Tfl . We have recently 
exploited the properties of kernels to provide nonlinear measures of bivariate Granger causality [171 ] . We reformulated 
linear Granger causality and introduced a new statistical procedure to handle over-fitti ng [18| in the linear case. Our 
new formulation was then generalized to the nonlinear case by means of the kernel trick [161 ] , thus obtaining a method 
with the following two main features: (i) the nonlinearity of the regression model can be controlled by choosing the 
kernel function; (ii) the problem of false-causalities, which arises as the complexity of the model increases, is addressed 
by a selection strategy of the eigenvectors of a reduced Gram matrix whose range represents the additional features 
due to the second time series. 

In this paper we describe in detail the kernel Granger approach and address use of Granger causality to estimate, 
from multivariate time series data, the topology and the drive-response relationships of a dynamical network. To this 
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aim, we generalize our method in [T3| to the case of multivariate data. 

The paper is organized as follows. In the next section we describe the kernel Granger causality method in the 
bivariate case, adding some details to the presentation in [lTj. In section III we generalize the method to process 
multivariate time series data, and show that the proposed method can discern whether an interaction is direct or 
mediated by a third time series. The case of dynamical networks is described in section IV: we analyze a system of 
interacting chaotic maps and a model of gene regulatory networks. In Section V we consider systems with sparse 
connectivity and limited data, and compare the bivariate Granger approach with a multivariate method based on LI 
minimization; then we analyze a real data set, gene expressions of HeLa cells. Section V summarizes our conclusions. 



II. BIVARIATE GRANGER CAUSALITY 

In this section we review the kernel method for Granger causality proposed in [l7| . Let us start with the linear 
case. 



A. Linear Granger causality 

Suppose to model the temporal dynamics of a stationary time series {t; n }n=i,.,N+m by an autoregressive model of 
order m: 

m 

and by a bivariate autoregressive model which takes into account also a simultaneously recorded time series 

{Vn}n=l,.,N+m'- 

m m 
& = £ A 'l + 12 B 3 Vn-j + K- 

The coefficients of the models are calculated by standard least squares optimization; m is usually chosen according to 
Akaike criterion [l9j or embedding techniques from the theory of nonlinear dynamical systems [20j |. 

The concept of Granger causality is [lOj |: 77 Granger-causes £ if the variance of residuals E' is significantly smaller 
than the variance of residuals E, as it happens when coefficients Bj are jointly significantly different from zero. This 
can be tested by performing an F-test or Levene's test for equality of variances [2l| . An index measuring the strength 
of the causal interaction is then defined as: 

where (•) means averaging over n (note that (E) = (E') = 0). Exchanging the roles of the two time series, one may 
equally test causality in the opposite direction, i.e. to check whether £ Granger-causes r\. 
We use the shorthand notations: 

Xj = . . . , £, +m _i) , 



Yi = (r)i, . . .,77 i+m _i) 

and Xi = £i+mi f° r i = 1, . . . , iV. We treat these quantities as N realizations of the stochastic variables X, Y and 
x. Let us denote X the m x TV matrix having vectors Xi as columns, and Z the 2m x TV matrix having vectors 
Zi = (Xj , i^ T ) T as columns. The values of x are organized in a vector x=(xi, . . . , £./v) T . In full generality we assume 
that each component of X and Y has zero mean, and that vector x has zero mean and is normalized, i.e. x T x=l. 
Now, for each i = 1, . . . , N, we define: 

m 

X~i ^ Aj £j_|_ m _j, 
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The vectors x=(xi, . . . , af/v) T and x'=(xi', . . . , x~n') t are the estimated values by linear regression, in the two cases. 
It is easy to show that x and x' have the following geometrical interpretation. Let H C %t N be the range of the 
N x N matrix K = X T X. Then x is the projection of x on H. In other words, calling v 1; . . . , v m the (orthonormal) 
eigenvectors of K with non- vanishing eigenvalue and 

m 
i=l 

the projector on the space H, we have x = Px. Let us define y = x — Px. Analogously x' — P'x, P' being the 
projector on the 2m-dimensional space H' C $t N , equal to the range of the matrix K' = Z T Z. Moreover, it is easy to 
show that 

(2) 



1-x'x 

Now note that H C H' . Therefore we may decompose H' as follows: H' = H © H^, where H is the space of all 
vectors of H' orthogonal to all vectors of H. H corresponds to the additional features due to the inclusion of {77} 
variables. Calling P^ the projector on iP 1 , we can write: 

s ll^yll 2 ^ 

Now we note that H 1 - is the range of the matrix 

K = K' K P P (K' K'P) = K' PK' K'P + PK'P. 

Indeed, for any u S $l N , we have Ku = v Pv, where v = K' (I — P) u S H', and Ku S H . It follows that H 1 - 
is spanned by the set of the eigenvectors, with non vanishing eigenvalue, of K. Calling t l5 . . . , t m these eigenvectors, 
we have: 

rn 
i=l 

where r, is the Pearson's correlation coefficient of y and ti (since the overall sign of ti is arbitrary, we can assume 
that ri is positive). Let Tti be the probability that r*j is due to chance, obtained by Student's t test. Since we are 
dealing with multiple comparison, we use the Bonferroni correction to select the eigenvectors ti/, correlated with y, 
with expected fraction of false positive q (equal to 0.05). Therefore we calculate a new causality index by summing, 
in equation (4), only over the {?v} such that 7Tj< < q/m, thus obtaining a filtered linear Granger causality index: 

Sf = Y,4 . (5) 

i' 

It is assumed that Sf measures the causality 77 — > £, without further statistical test. 



B. Kernel Granger causality 

In this subsection we describe the generalization of linear Granger causality to the nonlinear case, using methods 
from the theory of Reproducing Kernel Hilbert Spaces (RKHS) [lijj ]. Given a kernel function K, with spectral 
representation K(X,X') = ^2 a ^a4>a(X)(j) a (X r ) (see Mercer's theorem [HI]), we consider H, the range of the N x N 
Gram matrix K with elements K(Xi, Xj). In order to make the mean of all variables <p a {X) zero, we replace 
K — > K — P K — KP + PoKP , where P is the projector onto the one-dimensional subspace spanned by the 
vector such that each component is equal to the unity [la ]; in the following we assume that this operation has been 
performed on each Gram matrix. As in the linear case, we calculate x, the projection of x onto H . Due to the 
spectral representation of K, x coincides with the linear regression of x in the feature space spanned by y/X^4> a , the 
eigenfunctions of K\ the regression is nonlinear in the original variables. 
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While using both X and Y to predict x, we evaluate the Gram matrix K' with elements = K(Zi,Zj). The 
regression values now form the vector x' equal to the projection of x on H', the range of K'. Before evaluating the 
filtered causality index, as in the linear case, we note that not all kernels may be used to evaluate Granger causality. 
Indeed if Y is statistically independent of X and x, then x' and x should coincide in the limit N — > oo. This property, 
invariance of the risk minimizer when statistically independent variables are added to the set of input variables, is 
satisfied only by suitable kernels, as discussed in [22}. In the following we consider two possible choices, which fulfill 
the invariance requirement. 

Inhomogeneous polynomial kernel. The inhomogeneous polynomial (IP) kernel of integer order p is: 

K p {X,X') = (1 + X T X') P . 

In this case the eigenfunctions are made of all the monomials, in the input variables, up to the p — th degree. The 
dimension of the space H is mi = 1/B(p + 1, m + 1) — 1, where B is the beta function, and p — 1 corresponds to 
the linear regression. The dimension of space H' is ni2 = 1/B{p + 1, 2m + 1) — 1. As in the linear case, we note that 
H C H' and decompose H' = H®H^. Subsequently we calculate K = K' - PK' - K'P + PK'P; the dimension of 
the range of K is 7713 = 7712 — mi. Along the same lines as those described in the linear case, we construct the kernel 
Granger causality taking into account only the eigenvectors of K which pass the Bonferroni test: 

# = (6) 

%' 

the sum being only over the eigenvectors of K with probability tiv < q/1713. 
Gaussian kernel. The Gaussian kernel reads: 

and depends on the width a. a controls the complexity of the model: the dimension of the range of the Gram matrix 
decreases as a increases. As in previous cases, we may consider H, the range of the Gram matrix K, and H' , the 
range of K', but in this case the condition H C H' would not necessarily hold; therefore some differences in the 
approach must be undertaken. We call L the mi-dimensional span of the eigenvectors of K whose eigenvalue is not 
smaller than /J,X m ax, where X m ax is the largest eigenvalue of K and /1 is a small number (we use 10 -6 ). We evaluate 
x = _Px, where P is the projector on L. After evaluating the Gram matrix K', the following matrix is considered: 

m 2 

K*=5> Wl w7, (8) 

4=1 

where {w} are the eigenvectors of K', and the sum is over the eigenvalues {pi} not smaller than /x times the largest 
eigenvalue of K'. Then we evaluate K = K* PK* - K*P + PK'P, and denote P 1 - the projector onto the ma- 
dimensional range of K. Note that the condition 777,2 = mi + TO3 may not be strictly satisfied in this case (however in 
our experiments we find that the violation of this relation is always very small, if any). The kernel Granger causality 
index for the Gaussian kernel is then constructed as in the previous case, see equation (|6]). 



III. MULTIVARIATE KERNEL CAUSALITY 



Let {^(a) n }n=i,.,JV+m) a = 1, • • ■ , M, be M simultaneously recorded time series. In order to put in evidence the 
drive-response pattern in this system, one may evaluate the bivariate Granger causality, described in the previous 
sections, between every pair of time series. It is recommended, however, to treat the data-set as a whole, thus 
generalizing kernel Granger causality to the multivariate follows. We denote 

X ( c )t = (£(c)i, ■ • ■ , £(c)i+ m -l) T , 

for c = 1, . . . , M and i = 1, . . . , N. In order to evaluate the causality {£(a)} — » {£(&)}, wc define, for i — 1, . . . , N, 

Z t = (X(l)J,...,X(a)J,...,X(M)J) T , 

containing all the input variables, and 

X i = (X(l)J,...,X(M)J) T , 
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containing all the input variables but those related to {£(»)}■ Gram matrices K and K' are then evaluated: ify = 
K(Xi,Xj) and K[- = K(Zi, Zj). The target vector is now x=(£(6) 1+m , . . . , £(&)jv+ m ) T - Along the same lines as in 
the bivariate case, for IP kernel or the Gaussian one, we then calculate the causality index as in equation (6): it 
is denoted 8p (a — > b) and measures the strength of causality a — * b, taking into account all the available variables. 
Repeating these steps for all a and b, the casuality pattern in the data set is evaluated. Note that the threshold for 
the Bonferroni's correction, in the multivariate case, must be lowered by the number of pairs M(M — l)/2. 



A. Three coupled maps 

As an example, we consider the following system of three logistic maps 23]: 

X\ }t = 0.8 (l - ax\ t _ 1 ) + 0.2 (l — axl.i-i) + sr M> 

%2,t = 1 - G#2,t-1 + ST 2,t, (9) 
2:3,4 = 0.8 (l - axj t^) + 0.2 (l - ax\ t _ x ) + sr 3jt ; 

here a = 1.8, s — 0.01 and r's are unit variance Gaussian noise terms. The causal relationships implemented in these 
equations are 2 — > 1 and 1 — > 3. Analyzing segments of length N = 1000, we evaluate both the multivariate causality, 
as described in section 4, and the bivariate causality for all pairs of maps. We use the IP kernel with various values 
of p; the results are displayed in figure |T]). It is well known [ll[ that performing pairwise evaluation for multivariate 
data has the drawback that one cannot discern whether the influence between two time series is direct or is mediated 
by other time series. This is what happens in the present example. Both the multivariate and the bivariate analysis 
reveal the influences 2 — > 1 and 1 — > 3. On the other hand, the bivariate analysis reveals also the influence 2 — > 3, 
which is actually mediated by 1: the multivariate analysis recognizes 2 — * 3 as non-significative. 



IV. ANALYSIS OF DYNAMICAL NETWORKS 



In this section we simulate two dynamical networks and apply the multivariate Granger analysis to estimate the 
topology structure of systems from time series data. 



A. Network of chaotic maps 



Let us consider a coupled map lattice of n nodes, with equations, for i = 1, . . . , n: 

(n \ n 

1 ~ Q i3 i 1 ~ aX lt-l) + Ci i i 1 ~ aX lt-l) + ST M> (10) 
3=1 J 3=1 



where a, s and r's are the same as in equation ([9]), and represents the coupling j — > i. We fix n = 34 and construct 
couplings as follows. We consider the well known Zachary data set [24^ . an undirected network of 34 nodes. We 
assign a direction to each link, with equal probability, and set cy equal to 0.05, for each link of the directed graph 
thus obtained, and zero otherwise. The network is displayed in figure @: here the goal is to estimate this directed 
network from the measurements of time series on nodes. 

The multivariate Granger analysis, described in the previous section, perfectly recovers the underlying network 
using the IP kernel with p = 2, m = 1 and N = 10000. Note that while evaluating 5p (j — > i), for all i and j, 39270 
Pearson's coefficients r are calculated. Their distribution is represented in figure ((3|): there is a strong peak at r = 
(corresponding to projections that are discarded), and a very low number of projections with a rather large value of r, 
up to about r — 0.6. It is interesting to describe the results in terms of a threshold for correlations. Given a threshold 
value r, we select the correlation coefficients whose value is greater than r. We then calculate the corresponding 
causality indexes Sp(j — * i), and construct the directed network whose links correspond to non- vanishing elements 
of Sp(j — * i). In figure (j4|-top we plot the total number of links of the reconstructed network, as a function of 
the threshold r: the curve shows a plateau, around r = 0.1, corresponding to a directed network which is stable 
against variations of r. At the plateau 428 projections are selected, which coincide with those selected by means of 
Bonferroni's test. In figure (|4])-bottom we plot the number of errors (the sum of the number of links that exist in 
the true network and are not recovered, plus the number of links that exist only in the recovered network) versus 
the threshold r: the plateau leads to perfect reconstruction of the original network. We stress that a large number 
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of samples is needed to recover the underlying network: in the typical case we find that the network is perfectly 
reconstructed if N > 5000, whilst if N is further lowered some errors in the reconstructed network appear. Moreover, 
it is important to observe that, although all couplings dj have the same magnitude, the causality strengths 5p (j — > i) 
depend on the link, as it is shown in figure Granger causality is a measure of the information being transferred 
from one time series to another, and it should not be identified with couplings. 

B. Genetic regulatory network 

In this subsection we consider time series from a model of genetic regulatory network made of genes linked by 
weighted connections (inhibitory or excitatory) [25| . The expression levels of all genes, organized in a vector g, evolve 
as follows: 

g t +i = St + A (g t - TI) + S, (11) 

where A is a connectivity strength matrix corresponding to the network, T = 50, I is the identity matrix and £ 
is a vector of random variables uniform in [—10, 10]. The values of g are restricted by floor and ceiling function to 
range in [0, 100]: this constraint provides the nonlinear character of the model. As the simulation runs, multivariate 
data are sampled every t s time steps. Moreover, the continuous data values are discretized into n c categories (with 
equal bin sizes). In [25[ dynamic Bayesian network (DBN) models 26] were trained to data of length N to recover 
the structure of matrix A: the values t s — 5, n c — 3 and N > 2000 were found to lead to the best reconstruction of 
genetic networks by DBN. 

The genetic network we consider here is an example from [25| and consists of ten genes with connections described 
in figure ([6]): there are two independent regulatory pathways, one of which includes a large feedback structure. In 
figure ([7]) the typical curve of expression for a gene in the network is represented (top); the distribution of expressions, 
for the same gene, is bimodal (bottom). We simulate 100 times the system equations, starting from different initial 
conditions, and sample time series of length N — 2000, t s — 5 and rir = 3. The typical recovered network by DBN, 
on this example, corresponds to one missing link (from node 7 to 10) [251 ] . The linear multivariate Granger approach, 
with m = 2, leads to perfect reconstruction of the network in 90 cases out of the 100; we obtain similar results on all 
the examples presented in [25| . Using IP (with p > 1) and Gaussian kernels we obtain similar performances as the 
ones from the linear kernel. 

It is interesting to stress that the possibility that one has to reconstruct the true genetic network depends on the 
sampling rate. In figure (jHJ) we plot the mean number of errors (over 100 realizations) in the reconstructed network 
as a function of t s for the linear kernel and Gaussian kernel: the performance degrades as t s gets far from 5. 

Let us now discuss the case of large t s : all Granger causalities are recognized as non significative. On the other 
hand, at large t s , we find significant static linear correlations between time series of all pairs of genes belonging to the 
same pathway. In other words, referring to figure ([5]), the linear correlations of times series from every pair of genes 
extracted from {1, 2, 3, 4, 8, 9} is significant, as well as the linear correlation for every pair extracted from {5, 6, 7, 10}; 
consistently the linear correlation, for pairs of genes from different pathways, is not significant. We conclude that, at 
large t s Granger causality analysis is not effective, but static correlation analysis may anyway put in evidence groups 
of genes belonging to the same regulatory pathway. 

V. SPARSELY CONNECTED DYNAMICAL NETWORKS AND LIMITED DATA 

The Granger causality approach for dynamical networks here presented requires a large amount of data samples to 
provide trustable answers. However, there are situations (frequent in Bioinformatics) where the number of samples 
is smaller than the number of variables (genes): in these situations the multivariate Granger approach is unfeasible. 
Under the assumption of sparse connectivity, it has been proposed to replace least squares methodology with a 
multivariate approach using minimization with respect to LI norm Q. Here we show that there are situations where, 
even in presence of sparse connectivity and limited data, use of bivariate Granger causality is a better choice w.r.t LI 
minimization. Indeed, in these cases, the statistical robustness of the estimation of information flow between pairs of 
time series may still be good, with the drawback that some causality links, found by the bivariate approach, may not 
be direct but mediated. 

We construct a network of 100 nodes and 100 links using the preferential attachment procedure |27|; we give 
randomly a direction to each link, with 1/2 probabilities, thus obtaining a directed network. Let us denote d(i) the 
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number of nodes from which a link pointing to i starts. We evolve a linear system on this network, with equations: 

V^/0.8\ 

Xi,t = a, Xi,t-i + \dK) J Xj ' t ~ 1 + Ti '*' ' ' 

The sum is over nodes such that j — > i is a link of the network; r's are unit variance Gaussianly distributed noise 
terms; ai is one, if d(i) = 0, and 0.2 otherwise. After a transient, we sample n s consecutive time points, with 
n s = 20, 30, 40, 50, 60. The LI approach we use is the following. For each i = 1, . . . , 100, we find the vector c with 
minimum LI norm, among all those satisfying: 

100 

Xi,t+i = ^ CjXj >t , (13) 
i=i 

t = 1, . . . ,n s — 1. The interaction j — > i is considered significant if the absolute value of Cj exceed a threshold, fixed 
so that the total number of false positive connections is five. Subsequently we apply the bivariate linear Granger 
approach, described in Section IIA, for each pair of nodes: also for Granger's approach we fix a threshold for the 
correlation coefficients r, see equation (01, so that the total number of false positive connections is five. In figure © 
we depict the number of true positive connections found by the two approaches, as a function of n s . It is clear that 
here the bivariate Granger approach outperforms LI minimization. 



A. HeLa gene expression regulatory network 

HeLa is the most famous cell culture line to date (28|. These are cells isolated from a human uterine cervical 
carcinoma in 1951 and used in biomedical research especially to culture viruses. Whilst the patient ultimately died of 
her cancer eight months after the operation, her cells have lived on, still surviving in laboratories today. HeLa cells 
have somehow acquired cellular immortality, in that the normal mechanisms of programmed cell death after a certain 
number of divisions have somehow been switched off. We apply our approach to the HeLa cell gene expression data 
of [2{|. Data corresponds to 94 genes and 48 time points, with an hour interval separating two successive readings 
(the HeLa cell cycle lasts 16 hours). The 94 genes were selected, from the full data set described in [30f , on the basis 
of the association with cell cycle regulation and tumor development. 

First of all, we perform the analysis of the static pairwise correlations between time series: 800 pairs of genes are 
significantly correlated. Drawing a link for each correlated pair leads to an undirected network depicted in figure (|10|) : 
it is clear that there are two modules, and symbols in figure (|10p corresponds to the partition by the method of module 
identification described in (3lj . The first module is made of 23 genes and corresponds to the regulatory network of 
the transcriptional factor NFkB [33|; it contains several well known activators and targets of NFkB [34|, like, e.g., 
A20, ICAM-1, IL-6,VCAM-1, IkappaBa, JunB, MCP-1, FGF2, Cyclin. The second module, 62 genes, appears to be 
orchestrated by transcriptional factors p53 and STAT3. Note, however, that the two modules are not independent, 
as they form a highly related network. The proto-oncongene c-myc appears to be central between the two modules: 
it has 12 significant static correlations with both modules. After the discussion in section IV-B, we assume that the 
modular structure depicted in figure (|10|) is the result of regulatory mechanisms acting on time scales much smaller 
than the sampling time. 

Next, in order to detect causalities acting on the time scale of the sampling time, we apply bivariate Granger 
causality analysis. For all pairs we use the linear version (p = 1, m = 1) of our approach and evaluate the 

Pearson correlation coefficient r, equation for the causality i — > j. Due to the small number of samples, we do not 
use t-test to evaluate the probability ir corresponding to r: we generate a set of surrogates by permuting the temporal 
indices of the i-th times series while keeping fixed those of the j-th time series. The probability ir is identified with 
the fraction of times that an higher coefficient is obtained over 3 x 10 6 random shufflings of time indices of the i-th 
time series. Moreover, we use the False Discovery Rate (FDR) method [HJ instead of Bonferroni's correction. FDR 
works in the following way: the 94 x 93 = 8742 Pearson coefficients are ordered, {r?}, according to their increasing 
irt values, and a parameter q, which controls the fraction of false positive, is set to 0.05. The index £' is identified as 
the largest such that for all £ < £' we have tti < g|§2- Pearson coefficients 77 are accepted for £ < £' . This procedure 
selects 19 causal relationships, out of 8742; they are listed in Table I. IkappaBa is the most abundant inhibitory 
protein for NFkB [35[: our approach detects the significant causality IkappaBa — > NFkB. We find that NFkB is also 
casually related to IAP, an anti-apoptotic gene, and B99 (a direct target for transcriptional activation by p53: here 
no significant interaction between B99 and p53 has been detected). Three causality relationships involve Bcl-xL, the 
dominant regulator of apoptosis (active cell suicide) and TSP1, a peptide shown in some tumor systems to be linked 
with angiogenesis. Notably Table I also contains fibroblast growth factors, FGF7 and FGFR4, the tumor necrosis 
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IkappaBa 


-» 


NFkB 


NFkB 


-» 


B99 


IAP 


— ► 


NFkB 


c-myc 


— ^ 


FGFR4 


TSP1 


— > 


c-myc 


Killer/DR5 


-» 


c-myc 


R2 


-> 


c-myc 


VCAM-1 


-> 


TPD52L 


Bcl-XL 


-» 


OCT4 


A20 


-> 


Bcl-XL 


IRF-2 


-» 


BRCA1 


TSP1 


-» 


Bcl-XL 


Cyclin El 


-> 


E2F-1 


0CT4 


-» 


VCAM-1 


FGF7 




MCP-1 


TPD52L 




TNF-a 


TPD52L 




MASPIN 


PKIG 




ICAM-1 


PKIG 




TSP1 



TABLE I: Causalities for HeLa gene network. 

factor Killer /DR5, the myeloid tumor suppressor gene PKIG, the tumor protein TPD52-L and Cyclin El, a gene 
which is overexpressed in many tumors. In [29j | data have been analyzed with the sparse vector autoregressive model, 
a multivariate LI approach which depends on a regularization parameter, A, fixed by cross-validation. Only one 
causality relationship, out of the 19 in Table I, was revealed also in [29j: A20— *Bcl-XL. 



VI. DISCUSSION 

Our method of analysis of dynamical networks is based on a recent measure of Granger causality between time 
series, rooted on kernel methods, whose magnitude depends on the amount of flow of information from one series 
to another. By definition of Granger causality, our method allows analysis of networks containing cycles. Firstly 
we have demonstrated the effectiveness of the method on a network of chaotic maps with links obtained assigning 
a direction to the edges of the well known Zachary data set, using a nonlinear kernel: perfect reconstruction of the 
directed network is achieved provided that a sufficient number of samples is available. 

Secondly we studied a simulated genetic regulatory network. The results from our method were better than those 
from DBN approach. However our performance was strongly dependent on the sampling time, as it occurred also using 
DBN method. In this example, use of IP kernel, with (p > 2), or Gaussian kernels did not lead to improvement in the 
performance with respect to the linear kernel: this means that these kernel are not suitable to model the nonlinear 
constraint connected to the fact that expressions are confined in [0, 100]. Further work will be devoted to the search 
for kernels capable to capture this kind of nonlinearity: for a given application one should choose the proper kernel 
out of the many possible classes [l6j . 

Then we considered the case of sparse connectivity and limited data. Using an example consisting in a linear 
dynamical network on a graph grown by preferential attachment, we have shown that there are instances where the 
multivariate Granger approach is unfeasible, but the application of bivariate Granger analysis, to every pair of time 
series, leads to better results than those from a method based on LI minimization. Finally we have analyzed a real 
data set of temporal gene expression samples from HeLa cells. The static correlation analysis between time series, 
which is the result of regulation mechanisms with time scaler faster than the sampling rate, revealed the presence of 
two modules. Use of bivariate Granger causality has put in evidence 19 causality relationships acting on the time 
scale of one hour, all involving genes playing some role in processes related to tumor development. Our result on 
HeLa data has very little overlap with those from the output of a method based on multivariate LI minimization, but 
this is not surprising, as we observed the same fact also on the linear dynamical model of Section V, where the true 
connectivity was known. Wc remark that currently available data size and data quality make the reconstruction of 
gene networks from gene expression data a challenge. 

Detecting cause-effects influences between components of a complex system is an active multidisciplinary area of 
research in these years. The kernel approach here presented, provides a statistically robust tool to assess drive-response 
relationships in many fields of science. 
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FIG. 2: The directed network of 34 nodes obtained assigning randomly a direction to links of the Zachary network. 
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FIG. 3: The distribution of the 39270 r- values calculated while evaluating the causality indexes of the coupled map lattice (see 
the text). 




FIG. 4: (top) Concerning the coupled map lattice, the horizontal axis represents the threshold for the values of r; the plot 
shows the number of links of the directed network constructed from the projections whose Pearson's coefficient exceeds the 
threshold, (bottom) The total number of errors, in the reconstructed network, is plotted versus the threshold r. At large r the 
errors are due only to missing links, whereas at large r the errors are due only to links that do not exist in the true network 
and are recovered. 
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FIG. 6: The genetic regulatory network analyzed in the text. Numbers next to links specify regulation strength; arrows: 
excitatory; flat heads: inhibitory. 



13 




20 40 60 80 100 



g 

FIG. 7: Top: the typical curve of expression of a gene in the simulated regulatory network. Bottom: the distribution of 
expressions for the same gene. 
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FIG. 8: The mean number of errors (over 100 realization of the system of length N = 2000) obtained using linear kernel (p = 1, 
empty circles) and Gaussian kernel (stars), is plotted versus the sampling time t s . In the Gaussian case, a — 50 is used. The 
level of discretization is n c = 3 and m = 2. 
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FIG. 9: The true positive connections found by bivariate Granger approach (stars) and by multivariate LI minimization (empty 
circles)on the preferential attachment dynamical network. The error bar represents one standard deviation, evaluated over 50 
realizations. The number of false positive connections is always 5. 





FIG. 10: The undirected network obtained drawing a link between all pairs of significantly correlated genes of the HeLa data set 
(9 genes are not represented here as they are not correlated with any other gene; hence the number of nodes is 85). Squares and 
circles corresponds to the partition in two modules performed by the method described in [3l]]. The large square corresponds 
to the transcriptional factor NFkB. 



